Skip to content

fix: correct training entry point, env config, and GPU defaults for VAGEN#101

Merged
abrichr merged 1 commit into
mainfrom
fix/training-script-blockers
Mar 4, 2026
Merged

fix: correct training entry point, env config, and GPU defaults for VAGEN#101
abrichr merged 1 commit into
mainfrom
fix/training-script-blockers

Conversation

@abrichr
Copy link
Copy Markdown
Member

@abrichr abrichr commented Mar 4, 2026

Summary

Fixes 5 blockers that prevent verl-agent training scripts from running on g5.xlarge (single A10G GPU):

  • Fix A (n_gpus default): Changed default from 2 to 1 across train_verl_e2e.py, train_waa_vagen.yaml, and vm_cli.py gpu-train parser. g5.xlarge has 1 GPU; 4 GPUs is for g5.12xlarge.
  • Fix B (n_envs): Changed from 8 to 1 in train_waa_vagen.yaml. We have a single WAA VM; GRPO group size is controlled by rollout.n, not n_envs.
  • Fix C (training entry point): Changed from verl.trainer.main_ppo to vagen.main_ppo with Hydra --config-path and --config-name args. VAGEN has its own entry point.
  • Fix D (generated config): _generate_training_config now emits only the envs section (env spec YAML), not the full training config. Algorithm, trainer, and rollout settings are Hydra overrides on the command line. data.train_files/data.val_files reference the env spec.
  • Fix E (rollout config): Added VAGEN-required Hydra overrides: multi_turn.enable=True, rollout.n for GRPO group size, FSDP param/optimizer offload, gradient checkpointing, total_training_steps (replaces total_epochs), save_freq, val_before_train, and evaluate_url logging.

Test plan

  • uv run pytest tests/test_verl_env.py -v -- all 38 tests pass
  • Verify training launches on g5.xlarge with python scripts/train_verl_e2e.py --gpu-ip <IP> --skip-setup --task-id <UUID>

🤖 Generated with Claude Code

…AGEN

Five blockers for running verl-agent training on g5.xlarge:

A) n_gpus default: 2 -> 1 (g5.xlarge has 1 GPU; multi-GPU is for g5.12xlarge)
   - train_verl_e2e.py argparse default
   - train_waa_vagen.yaml trainer.n_gpus_per_node
   - vm_cli.py gpu-train --n-gpus default

B) n_envs: 8 -> 1 (single WAA VM; GRPO group size is rollout.n, not n_envs)
   - train_waa_vagen.yaml envs[0].n_envs

C) Training entry point: verl.trainer.main_ppo -> vagen.main_ppo
   - VAGEN has its own entry point with Hydra config support
   - Added --config-path and --config-name Hydra args

D) Generated config: full training config -> env spec only
   - _generate_training_config now emits only the envs section
   - Algorithm, trainer, and rollout settings are Hydra overrides on CLI
   - data.train_files/val_files point to the env spec YAML

E) Rollout config: added VAGEN-required Hydra overrides
   - multi_turn.enable=True for multi-step desktop tasks
   - rollout.n={group_size} for GRPO group size
   - FSDP param/optimizer offload for single-GPU memory
   - gradient checkpointing enabled
   - total_training_steps replaces total_epochs (VAGEN uses steps)
   - Added evaluate_url to log output

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@abrichr abrichr merged commit ac7437d into main Mar 4, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant